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Abstract 

We present a new parallel algorithm for the exact diagonalization of the 
t — t'-Hubbard model with the Lanczos-method. By invoking a new scheme 
of labeling the states we were able to obtain a speedup of up to four on 16 
nodes of an IBM SP2 for the calculation of the ground state energy and an 
almost linear speedup for the calculation of the correlation functions. Using 
this algorithm we performed an extensive study of the influence of the next- 
nearest hopping parameter t' in the t — i'-Hubbard model on ground state 
energy and the superconducting correlation functions for both attractive and 

repulsive interaction. 
PACS: 74.20 



1 



Typeset using REVT[t;X 



I. INTRODUCTION 



The Hubbard model [0 is one of the generic models in many particle physics. Due to 
difficulties with the analytic solution of the Hubbard model in two dimensions, this model 
is intensively studied with various numerical algorithms, e.g. exact diagonalization 0, [Q, 
^ [^], stochastic diagonalization [^], ||^ and quantum Monte Carlo algorithms 1^, [0], [PD| , 

The single band Hubbard model with additional next nearest neighbor hopping is given 
in real the space by: 

<i,j>,a «i,j»,a i 

The sum < i,j > is over the nearest neighbors and << i,j » is the sum over the next 
nearest neighbors. c| ^ is the creation operator for an electron with spin a on site i and rii^a- 
is the corresponding number operator. Throughout this article we take t = 1 as energy unit. 
In the momentum space this Hamiltonian reads as: 

k,(7 k,p,q 

with 

Ek = — 2t( cos{kx) + cos(A;j^)) — 4t' cos(A;^) ■ cos(fcj^) . (3) 

The usual Hubbard-model (t' = 0) has a Van Hove singularity in the density of states 
at half filling in the noninteracting case {U = 0). It is possible to move this Van Hove 
singularity to any electron filling by extending the Hubbard model by an additional next 



nearest neighbor hopping t'-term in the kinetic energy. The t — t'-Hubbard model |jT2[ shows 



superconductivity for repulsive interactions U with da;2_y2 -symmetry |Tl|] and for attractive 



interaction with on site s-symmetry , |TJ] , . 

As a measure for the superconductivity we calculate the reduced two particle density 
matrix according to the concept of Yang [ITB]. From this two particle density matrix we 
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calculate the two particle correlation functions for different symmetries and additionally we 



calculate the vertex correlation function |T7| 



The Van Hove scenario predicts an increase of Tc for fillings close to a Van Hove sin- 
gularity [0. We study the influence of the Van Hove singularity on the superconducting 



correlation functions by modifying the t'-hopping parameter for fixed fillings. Here we use 
the Lanczos-algorithm [|l^], 0] as exact diagonalization technique to determine the ground 
state and from there ground state properties of the t — t'-Hubbard model. 

The basic limitations on the calculations of large system sizes with the Lanczos-algorithm 
is the huge memory consumption of this method. But to our surprise we found that even for 
Hubbard systems of size 4x4, which the Lanczos method is capable of handling, the CPU 
consumption of the simulations was substantial and made a detailed scan of the parameter 
space given by interaction strength U, filling n and next nearest neighbor hopping t' almost 
impossible. We therefore implemented the Lanczos-method on IBM SP2 parallel computer 
with MPI (Message Passing Interface) for the communication between the processes to speed 
up the calculations. 

II. EXACT DIAGONALIZATION WITH THE LANCZOS-ALGORITHM 

Before turning our attention to the parallel techniques we outline the basic concept of 



the Lanczos method [T^. In the case of the exact diagonalization the Hamiltonian H of 
the system is written in matrix or Heisenberg representation. One chooses an orthonormal 
single particle basis, to represent the many-particle states. For the t — t'-Hubbard model we 
use the momentum-space representation. Each many particle basis state is a product 
of the spin up |$i|,t) and the spin down component: 

1$,) = |<l>,^,t) ® = cLt^Lt • • • 4„^,T ■ iiA2,l ■ ■ ■ ' (4) 

where c\. ^ is the creation operator of an electron with momentum ki and spin a. |0) is the 
vacuum state. 
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Each many particle state can be represented by means of the basis states and 
coefficients af 

M 

|v[/) = 5:a,|$,) . (5) 

i-l 

The Hilbert space of a system consisting of a lattice of L sites and electrons with spin 
up and rii electrons with spin down has the dimension 

M^M,.M,^(_^).(2 . (6) 

As the size of the Hilbert space also determines the size of the computer memory, that is 
used in the calculation, one tries to reduce the Hilbert space. The usual way to restrict the 
size of the Hilbert space M is to apply symmetries of the lattice and the Hamiltonian. The 
translation invariance is a symmetry easily implemented when solving the t — t'-Hubbard 
model in momentum space representation with the Lanczos algorithm. The Hilbert space 
decomposes into subspaces containing only basis states which have the same total 
momentum K: 

i=i j=i 

where fcj and pj are the momenta of the creation operators c^. ^ resp. c^. ^ in eq. ^. We 
denote the number of states of particles with the same spin with the total momentum as 
M^^ . In a 4 X 4 system with = 4 the number of states varies for different momentum 
between 112 and 120. The states must have the total momentum Ki = K — K-^, so 

that the product state |$i^,t) ® I'^'j^.i) in the subspace . Altogether the subspace M/^ 
has the size 

= E ■ Mi_^^ . (8) 

For various fillings = rii table | gives an overview. 

When storing the coefficients with 8 Byte floating point numbers the memory demand 
for one state is 8 ■ M = 8 ■ 4368^ ~ 146 MByte for a lattice with 16 sites and = = 5 



electrons. Even if one uses the translation symmetry it remains a demand of 8 ■ M/^ ^ 9 



MByte for each state. In the standard Lanczos-algorithm [BO] it is necessary to store 



three many particle states |\E') to calculate the ground state and the ground state energy. 



We implemented the Lanczos-iteration ||20|| : 

7. = (^.l(^^-|*,)-/3,-i|^,-i)) , (9) 

with the coefficients 7^ (diagonal) and f3j (offdiagonal) of the tridiagonal matrix Tj and the 
Lanczos- vectors I^E'j). From this tridiagonal matrix Tj we calculate the eigenvalues of H. In 
the Lanczos- scheme it is not necessary to transform the matrix H. Therefore it is even not 
necessary to store the matrix elements they are only calculated, when they are 

needed for the further evaluation of the Lanczos-iteration (eq. 

III. PARALLELIZATION OF THE LANCZOS ALGORITHM 

In this section we concentrate our effort on how to speed up the simulations with the 
Lanczos method. The determination of the ground state properties of a Hamiltonian with 
the Lanczos-algorithm consists of two main parts concerning the consumption of CPU-time. 
First the ground state energy Eq and the ground state |\E'o) are calculated. The second main 



part is the determination of the two-particle density matrix |TH]: 



EQiiQ;j($i|4,t4,iCfc3,iCfc4,Tl*i) 



(10) 



as the main observable of interest. 

A. Algorithms for handling the basis states 

To handle an arbitrary state |\I') it is necessary to know for each basis state the 
indices fcj and Pi of the creation operators in eq. H and the weights ttj. 



A simple possibility for such an algorithm is the bitcoding or bitrepresentation of the 
basis states. Here the momenta ki are labeled from 1 to L and the momenta Pi from L + 1 
to 2L. Then the states are expressed by an one dimensional array of bits, which are 
one for occupied sites, and otherwise the bits are zero. If one interprets this array as binary 
representation of an integer number one has an algorithm to assign each basis state 1$^) an 
index j. As example we take a lattice with 4 sites and each two electrons with spin up and 
down: 

1$^.) = 4 .|.4 -|. -4^4 JO) ^ 1010 0110 ^ 166 = J ^ . (11) 

For any many particle state only the coefficients aj need to be stored. 

But the bitrepresentation has the great disadvantage, that a huge amount of memory 
is wasted, because the bitrepresentation of many integer numbers j does not correspond to 
a valid basis state. For a system with L latticepoints in an array of the length (2^)^ one 
only stores M <^ (2-^)^ numbers. For the above example L = 16 and n| = nj^ = 5 there is 
(2i6)VMf ^ 3602. 

Therefore it is desirable to use another algorithm ^ ])■ One example is the hashing 
algorithm [ pi[] . 

We developed a new algorithm. Though we only present results for the momentum 
space representation, this algorithm can also be implemented very efficiently for the exact 



diagonalization in real space |22 . 



B. Numbering of the states 

First we are numbering the momenta from 1 to L (fcj G {1, . . . , L}). Second we define 
ki < k2 < . . . < kn„ to fix the sign. 
The state with the number 1 is 

l<fi,.) = cLct...4.,J0) . (12) 
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In |$2,(t) the electron cj^^ ^ moves from to + 1. In \^ L-ncr+i,^) this "last" electron has 
reached the latticepoint L. Next in |$L_n^+2) the two creation operators with the highest 
index are increased by one, 

|*L-n.+2,<7) = c\^aC2,a ■ ■ ■ i -2,aCi<„<xCL+l,a 1 0) • (l^) 

Then the " last" creation operator moves. These are the states with the numbers L — n^ + 2 
to 2L — 2n„ + 1. Next the "last" two electrons go to the sites n„ + 1 and n„ + 2. If both 
electrons have reached the final two lattice sites (L — 1, L) three electrons move in the same 
manner through the lattice. 

One gets the number v of the state |$i„,o-) with 



, Tin- — m , 



El E ill . (14) 



m=l \j=l^_i+l 

where Im is the position of the electron m in the lattice. 

Using the translational invariance of the Hamiltonian in the k-space representation 
means, that we keep the total momentum K of the basis states fixed. In this case 
one can only choose the "spin-up" part of the basis state free and take such a "spin-down" 
state, that eq. |^ is full filled. This means, we must use another convention to label the 
states. 

We generate all states |$j,o-) in the sequence as described above and calculate the mo- 
mentum. For each momentum we count independently the indices. 

The coefficients are now labeled in following way: First we combine with state number 
1, momentum 1 and spin up with all possible states with spin down for a given K. Then 
we do the same with state number 2, momentum 1 and spin up. Next we switch to state 1 
with momentum 2 and spin up and so on. 

C. Dividing up the memory 

In the parallel algorithm each of the P processes stores the coefficients of Mp = / P 
basis states, is stored on process p = i/Mp. 



D. Matrix-vector multiplication 



In the Lanczos method it is necessary to perform a matrix- vector multiphcation between 
the matrix H and a Lanczos- vector In the parallel algorithm this is carried out in the 
following way: For i — 1, . . . Mp: 

• Each process p calculates the index numbers j of the basis states H ■ |$j_|_(p_i)Mp), 
the multiplication results for the i-th state and the process, on which the states j are 
stored. 

• Each process exchanges the multiplication results and the numbers j of the basis states, 
which are not stored on the process, with process pj = j /Mp. 

E. Calculation of the reduced two particle density matrix 

To calculate the two-particle density matrix gki,k2,k3M i^i efficient way, one only cal- 
culates the elements 

{^m\cl^,'tcl^,lCk,,lCk^,^\^n) , (15) 

which are nonzero. That means one takes a basis state |$n), one of the possible combi- 
nations of ki, k2, and k^ and applies Cki,'['^l:2,i^k3,i^k4,i \^n), afterwards one calculates 
the index number of this transformed basis state 

Each process stores the complete matrix Qki^k2,k3M- This matrix takes for example in the 
4x4 system 8 • 16'^ Byte or 512 KByte of memory. 

In this algorithm one has to calculate A^i = ■ expectation values, compared to 
= (M^y expectation values that would be calculated if one took all combinations 
and into account. In a 4 x 4 system with n-^ — rii — 5 electrons N2/N1 fa 18 and the 
amount of saved CPU-time is significant. But in a 4 x 4 system with only — ni — 3 
electrons N2/N1 pa 0.3 and this algorithm is slower. 
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To calculate Qk^MMM ^ similar way for the exchange of the weights and numbers 

as for matrix- vector multiplication (see section p^llDp . 

At the end the values of the arrays QkiMMM of all processes are summed on one process. 

IV. PERFORMANCE ANALYSIS OF THE PARALLEL CODE 

First we take a look on the dependence of the CPU time of one Lanczos iteration (eq. ^ 
for a different number of processes. As example we use a 4 x 4 lattice with three different 
numbers of electrons (fig. |l|). 

The small decay from 1 to 2 processes is due to communication between processes which 
is only necessary for more than one process. Then one sees as expected a decrease of 
computation time. As expected this decrease vanishes for an increasing number of processes, 
since the communication is growing with the number of processes. 

In figure ^ we examine the CPU-consumption for the two-particle density matrix for the 
same system size and fillings as in figure ^ Here the gain of time is much larger than for the 
calculation of the energy. In this case more computation is performed for the determination 
of one matrix element. For more than 2 processes the dependence of nodes and CPU-time 
is nearly linear. 

This can be understood, if we look on the numbering of the states. Most of the states 
H ■ are stored on the same node and only for a fraction of these states the weights must 
be interchanged with an other process. 

Summarizing the algorithm for this parallel implementation of the Lanczos-iteration 
and the determination of the two-particle density matrix is a coarse grained algorithm and 
therefore achieves a good speed-up on a parallel computer like the IBM SP2, with only 
some, but very powerful, processors. For very many processors the communication grows 
dramatically and no further speed-up is reachable. 
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V. NUMERICAL RESULTS 



Now we want to turn our attention from the technical points of view to physical properties 
of the t — t'-Hubbard model. Especially we study the influence of the next nearest hopping 
parameter t' on the ground state energy and superconducting correlation functions in the 
ground state of a 4 x 4 cluster. We focus to ng = = = 5 electrons which corresponds 
to a filling (n) = 0.625. This is a so called closed shell situation, which can be also handled 



with the projector quantum Monte Carlo method |15 |. 

First we study the ground state energy Eq in the attractive t — t'- Hubbard model ( fig. 
^). For small and intermediate interaction strength {\U\ < 10) there is a visible difference 
between the energy with t' = and t' = —0.22. This difference results from the changes in 
the structure of Ek (eq.|^) with t'. But for large interaction strength \U\ the influence of the 
kinetic part is vanishing. In this interaction regime the ground state energy is approximately 
linear with U and approaches slowly the energy Urif. of the system without hopping. 

Next we turn our attention to the superconducting correlation functions. In the concept 
of off diagonal long range order the largest eigenvalue and eigenvector of Qk^MMM 
calculated. The quantum Monte Carlo algorithms handle system sizes, where it is impossible 
to calculate the complete two-particle density matrix due to the memory consumptions . 
In order to compare the exact diagonalization results with the quantum Monte Carlo data 
we study two-particle correlation functions for certain symmetries [01], e.g. 



(16) 

j S,5' 

where the index s denotes the on site s-wave symmetry and d the dx2_y2-waye symmetry. 
The factor gs = ±1 gives the signs of the d-wave. (-1-1 in x- and —1 in y-direction) These full 
correlation functions have nonzero values even for a system with no interaction. Responsible 
for this are the one-particle correlation functions 
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CoW = ;^E(cLw) ' (17) 
j 

which decay to zero with ^ and thus do not really contribute to the long range behavior 
that signals superconductivity. To exclude the contribution of the one-particle correlation 
functions we define the vertex correlation function as 

Q(r) = a{r) - Clir) . C^{r) , 

(18) 

Cl{r) = C^{r) - E (gsgs'Clir) ■ Ci(r + 6 - 6')) . 

In figure ^we show Cg and CJ in dependence of the distance |r|. For \J = —8 there is a 
visible difference only for r = 0. For U = —1 the one-particle contributions are dominant. In 
this case it is important to study the vertex correlation function to get the "superconducting" 
correlations. But already for U = —2 and |r| > the difference between full and vertex 
correlation function is less than 30% and it is less important to take in account. 

As a measure for the superconductivity in a system we show in fig. ^ the average of the 
vertex correlation function 

c: = jEc:{^ . (19) 

For a small interaction strength \U\ the increase of the correlation functions is small. 
Between U = —2 and U = —10 there is a strong increase. Finally at \U\ > 10 the curves 
flatten. 

For correlation functions the influence of the additional hopping t' remains important 
even if there is nearly no difference in the energies (cp. fig. ^ and ^ f/ < —10). 

Therefore we study the influence of t' for the interaction U = —4 ( fig. ^). The correlation 
functions have a broad maximum around t' = —0.50. As it is commonly accepted [|T5| the 
finite size gap has an infiuence on the superconducting correlation functions. The finite size 
gap g(t') in the energy dispersion Sk (eq. ^) of the free system {U = 0) between the highest 
occupied state and the lowest unoccupied state is given in this case by: 
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2-4 - It'l for -0.50 < t' < 
9it') = { " " . (20) 

for t' < -0.50 



This means the finite size gap is becoming smaller with increasing \t'\. According to |T5 
this will lead to an increase in the superconducting correlations. Figure |^ confirms this for 
t' > —0.50. But at t' < —0.50 the correlations decrease again. Therefore also the structure 
of the energy dispersion Sk has an influence on the correlation functions. 

In figure ^ the maximum of the correlation functions is not at t' = —0.37, where the 
noninteracting system has a Van Hove singularity in the thermodynamic limet. There a 
maximum of Tc is predicted by the Van Hove scenario ||18|. But for small system sizes one 
cannot really speak of a Van Hove singularity. Therefore the results of figure ^ are not in 
contradiction to the Van Hove scenario. 

Yet, the question remains, whether the observed increase of the correlation function 
results only from a vanishing finite size gap or is related to changes of the Fermi surface. To 
clarify this point it will be necessary to calculate larger systems. 

Next we study the influence of the t' hopping for repulsive interaction U = 4. Quantum 
Monte Carlo calculations show a plateau for the da,2_y2 correlation function in the t — t'- 
Hubbard model ||2^, |^5[|. Figure shows the average correlation functions with 



symmetry. 

Here the full correlation function is nearly independent of t' for t' > —0.4. The vertex 
correlation function is much smaller than in the attractive case (cp. fig. |^ and fig. |^). The 
increase of (7^ by a factor of about 10 between t' = and t' = —.45 is much larger than 
the increase of Cg and (7J in the attractive t — t'- Hubbard model and of Cd in the repulsive 
model. 

In figure ^ the vertex correlation function with d^-i-y-^ symmetry is plotted against the 
distance |r| of the "cooper pairs". For t' = —0.40 the vertex correlation function has, in 
contrast to larger t', no longer a negative value at |r| = \/2 and is positive for all distances 
|r|. This negative value at |r| = \/2 is also seen in the quantum Monte Carlo results for 
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larger systems [|TT| , |^ . 



In the case of t' = —0.50, where the gap g{t') gets zero, C^(|r|) changes its shape 
completely and most values are negative and also the average is negative. In contrast to the 
attractive t — t'-Hubbard model, where the plateau is decreasing gradually, in the repulsive 
case one observes a complete break down of the plateau for t' = —0.5. This means again, 
that the vertex correlation function depends strongly on the energy dispersion Ek (eq.|^), 
which is transformed due to the t'-hopping. As in the attractive t — t'-Hubbard model 
the maximum of the superconducting correlations is not at t' = —0.37, where a Van-Hove 
singularity is in the non interacting infinite system. 

To clarify the origin of this behavior it is necessary to study larger systems. For a possible 
connection with the Van-Hove-scenario it is also necessary to calculate the density of states 
for this parameter regime. 

VI. CONCLUSION 

We have presented an effective algorithm for the implementation of the exact diagonal- 
ization of the t — t'-Hubbard model in momentum-space respresentation. As method for the 
exact diagonalization we use the Lanczos algorithm. We showed a detailed description of the 
parallel algorithm. The speed-up of the code is almost linear for the correlation functions 
and is increasing with increasing size of the Hilbert space. 

The key point in our algorithm is a new method of labeling the states that is compact 



and in contrast to previous methods |^, also gives a consecutive order without any 
interruption. This makes the distribution on the different processes for parallelization a 
straightforward task. With the access to powerful modern parallel computers we were able 
to scan the parameter space of the t — t'-Hubbard model in more detail. 

The influence of the t'-hopping to the ground state energy is vanishing with increasing 
interaction strength in the attractive t — t'-Hubbard model. This is in contradiction to the 
correlation functions with on site s-wave symmetry, where the influence of t' is remaining 
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for all studied attractive interactions. 

The average full and vertex on site s-wave correlation functions have a broad maximum 
at t' = —0.5, where the gap g{t') is vanishing, in the attractive t — t'-Hubbard model. 

In the repulsive t — i'-Hubbard model only the d^2_j^2 vertex correlation functions show a 
strong increase with decreasing t' and gap g{t'). At t' — —0.5 and g{t') — CJ has a break 
down to a negative value. 

The origin of this behavior and a possible connection with the Van-Hove scenario for 
high Tc superconductors is yet not clear. Simulations for larger systems are necessary. 
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TABLES 

TABLE I. Size of the Hilbert space in a 4 x 4 system. M^: number of states for the electrons 

with spin a, number of states witli spin a and momentum k, M: size of the Hilbert space, 

M/^: size of the subspace of the Hilbert space for the states with momentum K = (0, 0). 
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FIGURES 



FIG. 1. CPU-time for one Lanczos-iteration in a 4 x 4 lattice. 

FIG. 2. CPU-time for the determination of the two-particle density matrix Qki,k2MM in a 4 x 4 
lattice. 

FIG. 3. Ground state energy in a 4 x 4 lattice with n-f = rzj^ = 5 electrons for the t — i'-Hubbard 
model. 

FIG. 4. Full (Cs(|r|)) and vertex correlation (CJ(|r|)) function with on site s-wave symmetry 
in a 4 x 4 lattice with n| = = 5 electrons in the t — t'-Hubbard model with t' = —0.22. 

FIG. 5. Average vertex correlation function (CJ) with on site s-wave symmetry in a 4 x 4 
lattice with n-f = nj^ = 5 electrons in the t — t'-Hubbard model. 

FIG. 6. Average full {Cg) and average vertex correlation (Cg) function with on site s-wave 
symmetry in a 4 x 4 lattice with n-f = = 5 electrons in the t — t'-Hubbard model and the 
attractive interaction U = —4. 

FIG. 7. Average full (Cd) and average vertex correlation (C^) function with d3,2_j^2-wave sym- 
metry in a 4 X 4 lattice with n-f = rzj^ = 5 electrons in the t — t'-Hubbard model and the repulsive 
interaction U = 4. 

FIG. 8. Vertex correlation (CJ(|r|)) function with d^2_j^2-wave symmetry in a 4x 4 lattice with 
n| = nj^ = 5 electrons in the t — t'-Hubbard model and U = 4.0. 
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